Search CORE

74 research outputs found

Genome-wide inference of ancestral recombination graphs

Author: Gronau Ilan
Hubisz Melissa J.
Rasmussen Matthew D.
Siepel Adam
Publication venue
Publication date: 01/01/2013
Field of study

The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the "ancestral recombination graph" (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of n chromosomes conditional on an ARG of n-1 chromosomes, an operation we call "threading." Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the true posterior distribution and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. Preliminary results also indicate that our methods can be used to gain insight into complex features of human population structure, even with a noninformative prior distribution.Comment: 88 pages, 7 main figures, 22 supplementary figures. This version contains a substantially expanded genomic data analysi

arXiv.org e-Print Archive

CiteSeerX

Cold Spring Harbor Laboratory Institutional Repository

Directory of Open Access Journals

PubMed Central

FigShare

Recommended from our members

Whole-genome sequence analysis shows that two endemic species of North American wolf are admixtures of the coyote and gray wolf.

Author: Cahill James A
Fan Zhenxin
Gronau Ilan
Pollinger John P
Robinson Jacqueline
Shapiro Beth
vonHoldt Bridgett M
Wall Jeff
Wayne Robert K
Publication venue: eScholarship, University of California
Publication date: 01/07/2016
Field of study

Protection of populations comprising admixed genomes is a challenge under the Endangered Species Act (ESA), which is regarded as the most powerful species protection legislation ever passed in the United States but lacks specific provisions for hybrids. The eastern wolf is a newly recognized wolf-like species that is highly admixed and inhabits the Great Lakes and eastern United States, a region previously thought to be included in the geographic range of only the gray wolf. The U.S. Fish and Wildlife Service has argued that the presence of the eastern wolf, rather than the gray wolf, in this area is grounds for removing ESA protection (delisting) from the gray wolf across its geographic range. In contrast, the red wolf from the southeastern United States was one of the first species protected under the ESA and was protected despite admixture with coyotes. We use whole-genome sequence data to demonstrate a lack of unique ancestry in eastern and red wolves that would not be expected if they represented long divergent North American lineages. These results suggest that arguments for delisting the gray wolf are not valid. Our findings demonstrate how a strict designation of a species under the ESA that does not consider admixture can threaten the protection of endangered entities. We argue for a more balanced approach that focuses on the ecological context of admixture and allows for evolutionary processes to potentially restore historical patterns of genetic variation

Princeton University Open Access Repository

eScholarship - University of California

Adaptive Distance Measures for Resolving K2P Quartets: Metric Separation versus Stochastic Noise

Author: Buneman P.
Ilan Gronau
Irad Yavneh
Neymann J.
Saitou N.
Shlomo Moran
Publication venue: 'Mary Ann Liebert Inc'
Publication date
Field of study

Crossref

Optimal implementations of UPGMA and other common clustering algorithms

Author: Akella
Barthelemy
Benzécri
Day
Du
Elias
Eppstein
Gronau
Have
Ilan Gronau
Juan
Křivánek
Murtagh
Murtagh
Olson
Saitou
Shlomo Moran
Sibson
Sneath
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Recursive construction of perfect DNA molecules from imperfect oligonucleotides

Author: Aho AV
Alsuwaiyel MH
Chomsky N
Drexler KE
Ehud Shapiro
Gregory Linshiz
Hecker KH
Hopcroft JE
Hutchison CA
Ilan Gronau
Mandelbrot BB
Rivka Adar
Shai Kaplan
Sivan Ravid
Tuval Ben Yehezkel
Publication venue: Nature Publishing Group
Publication date
Field of study

Making faultless complex objects from potentially faulty building blocks is a fundamental challenge in computer engineering, nanotechnology and synthetic biology. Here, we show for the first time how recursion can be used to address this challenge and demonstrate a recursive procedure that constructs error-free DNA molecules and their libraries from error-prone oligonucleotides. Divide and Conquer (D&C), the quintessential recursive problem-solving technique, is applied in silico to divide the target DNA sequence into overlapping oligonucleotides short enough to be synthesized directly, albeit with errors; error-prone oligonucleotides are recursively combined in vitro, forming error-prone DNA molecules; error-free fragments of these molecules are then identified, extracted and used as new, typically longer and more accurate, inputs to another iteration of the recursive construction procedure; the entire process repeats until an error-free target molecule is formed. Our recursive construction procedure surpasses existing methods for de novo DNA synthesis in speed, precision, amenability to automation, ease of combining synthetic and natural DNA fragments, and ability to construct designer DNA libraries. It thus provides a novel and robust foundation for the design and construction of synthetic biological molecules and organisms

Crossref

PubMed Central

Inference of Natural Selection from Interspersed Genomic Elements Based on Polymorphism and Divergence

Author: 1000 Genomes Project Consortium
Adam Siepel
Andolfatto
Bierne
Boffelli
Boyko
Bresnick
Bustamante
Bustamante
Charlesworth
Chernoff
Clark
Cooper
Dermitzakis
Dore
Drmanac
Dunham
Eyre-Walker
Eyre-Walker
Fay
Gerstein
Guttman
Harrow
Hernandez
Hubisz
Ilan Gronau
Jaaved Mohammed
Ko
Kondrashov
Lai
Lehmann
Leonardo Arbiza
Mackay
Marques
Matera
McDonald
Merika
Moses
Nielsen
Okamura
Pang
Pollard
Roy
Sawyer
Self
Siepel
Smith
Stark
Stoletzki
Thomas
Ulitsky
Watterson
Williamson
Wilson
Yang
Yi
Zhang
Publication venue: 'Oxford University Press (OUP)'
Publication date: 05/02/2013
Field of study

Complete genome sequences contain valuable information about natural selection, but extracting this information for short, widely scattered noncoding elements remains a challenging problem. Here we introduce a new computational method for addressing this problem called Inference of Natural Selection from Interspersed Genomically coHerent elemenTs (INSIGHT). INSIGHT uses a generative probabilistic model to contrast patterns of polymorphism and divergence in the elements of interest with those in flanking neutral sites, pooling weak information from many short elements in a manner that accounts for variation among loci in mutation rates and genealogical backgrounds. The method is able to disentangle the contributions of weak negative, strong negative, and positive selection based on their distinct effects on patterns of polymorphism and divergence. Information about divergence is obtained from multiple outgroup genomes using a full phylogenetic model. The model is efficiently fitted to genome-wide data by decomposing the maximum likelihood estimation procedure into three straightforward stages. The key selection-related parameters are estimated by expectation maximization. Using simulations, we show that INSIGHT can accurately estimate several parameters of interest even in complex demographic scenarios. We apply our methods to noncoding RNAs, promoter regions, and transcription factor binding sites in the human genome, and find clear evidence of natural selection. We also present a detailed analysis of particular nucleotide positions within GATA2 binding sites and primary micro-RNA transcripts.Comment: 21 page manuscript, 4 figure, 4 tables + 3 supp figures + 3 supp tables + supp methods. V4: additional results on human noncoding RNAs annotated by GENCODE + refinement of previous versions + additional supplementary material included to main document. V5: some minor modifications. V6: this is an electronic version of an article published in Mol Biol Evol, 201

arXiv.org e-Print Archive

Crossref

Cold Spring Harbor Laboratory Institutional Repository

PubMed Central

A community-maintained standard library of population genetic models

Author: Adrion Jeffrey R.
Baumdicker Franz
Carlson Jedidiah
Cartwright Reed A.
Cole Christopher B.
Dukler Noah
Durvasula Arun
Galloway Jared G.
Gladstein Ariella L.
Gower Graham
Gravel Simon
Gronau Ilan
Gutenkunst Ryan N.
Kelleher Jerome
Kern Andrew D.
Kim Bernard Y.
Kyriazis Christopher C.
Lohmueller Kirk E.
McKenzie Patrick
Messer Philipp W.
Noskova Ekaterina
Ortega-Del Vecchyo Diego
Racimo Fernando
Ragsdale Aaron P.
Ralph Peter L.
Schrider Daniel R.
Siepel Adam
Struck Travis J.
Tsambos Georgia
Publication venue: 'eLife Sciences Publications, Ltd'
Publication date: 01/01/2020
Field of study

The explosion in population genomic data demands ever more complex modes of analysis, and increasingly, these analyses depend on sophisticated simulations. Recent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here, we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

Copenhagen University Research Information System

The University of Arizona

Genome sequencing highlights the dynamic early history of dogs

Author: Adam H Freedman
Adam R Boyko
Adam Siepel
Alan Wilton
Belen Lorente-Galdos
Can Alkan
Carles Vilà
Carlos D Bustamante
Clarence Lee
Diego Ortega-Del Vecchyo
Elaine A Ostrander
Elaine A Ostrander
Eli Geffen
Eunjung Han
Farhad Hormozdiari
Heidi G Parker
Holly Beale
Ilan Gronau
John Novembre
Josip Kusak
Kevin Squire
Marco Galaverni
Oscar Ramirez
Pedro M Silva
Peter Marx
Rena M Schweizer
Robert K Wayne
Stanley F Nelson
Timothy T Harkins
Tomas Marques-Bonet
Vasisht Tadigotla
Zhenxin Fan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we generated high-quality genome sequences from three gray wolves, one from each of the three putative centers of dog domestication, two basal dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. Analysis of these sequences supports a demographic model in which dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow. In dogs, the domestication bottleneck involved at least a 16-fold reduction in population size, a much more severe bottleneck than estimated previously. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was substantially larger than represented by modern wolf populations. We narrow the plausible range for the date of initial dog domestication to an interval spanning 11-16 thousand years ago, predating the rise of agriculture. In light of this finding, we expand upon previous work regarding the increase in copy number of the amylase gene (AMY2B) in dogs, which is believed to have aided digestion of starch in agricultural refuse. We find standing variation for amylase copy number variation in wolves and little or no copy number increase in the Dingo and Husky lineages. In conjunction with the estimated timing of dog origins, these results provide additional support to archaeological finds, suggesting the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that, surprisingly, none of the extant wolf lineages from putative domestication centers is more closely related to dogs, and, instead, the sampled wolves form a sister monophyletic clade. This result, in combination with dog-wolf admixture during the process of domestication, suggests that a re-evaluation of past hypotheses regarding dog origins is necessary

Cold Spring Harbor Laboratory Institutional Repository

Bilkent University Institutional Repository

Directory of Open Access Journals

PubMed Central

Digital.CSIC

UPF Digital Repository

FigShare

Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations

Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone

Publikationer från Uppsala Universitet

Edinburgh Research Explorer

Digitala Vetenskapliga Arkivet - Academic Archive On-line